BMJ Health & Care Informatics
● BMJ
Preprints posted in the last 90 days, ranked by how well they match BMJ Health & Care Informatics's content profile, based on 13 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit.
Adekunle, T.; Ohaeche, J.; Adekunle, T.; Adekunle, D.; Kogbe, M.
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of trust, risk, and oversight. MethodsGuided by sociotechnical systems theory and institutional trust scholarship, we conducted semi-structured in-depth interviews with twenty cybersecurity professionals working in healthcare-relevant domains. Participants were recruited through professional networks and LinkedIn outreach. Interviews were conducted between May and August 2025. They were audio-recorded and transcribed verbatim. Data were analyzed using qualitative content analysis with constant comparison. Two researchers independently coded transcripts and refined themes through iterative discussion. The study received Institutional Review Board approval. ResultsParticipants described health AI as an augmented clinical infrastructure. They emphasized that AI extends workflow capacity but requires sustained human oversight. Healthcare data systems were characterized as fragmented and vulnerable. Breaches were treated as anticipated events. Trust in AI was described as contingent and built over time through visible accountability. Cybersecurity stewardship was framed as foundational to institutional trustworthiness. ConclusionsHealth AI credibility emerges through governance practices that demonstrate accountability. Cybersecurity professionals and institutional stakeholders jointly shape trust in digitally mediated healthcare systems through governance decisions that signal accountability.
Waken, R.; Lou, S. S.; Hofford, M.; Eiden, E.; Burk, C.; Kim, S.; Esker, J.; Zhang, L.; Maddox, T.; Abraham, J.; Lai, A. M.; Bhayani, S.; O'Dell, D.; Paynter, K.; Thomas, M.; Gerling, M.; Payne, P. R. O.; Kannampallil, T. G.
Show abstract
ImportanceClinician adoption and adaptation of new tools evolve over time. Prior studies of ambient Artificial intelligence (AI) scribes have primarily relied on single time-point measurements (e.g., pre-post), potentially obfuscating their true impact on outcomes. ObjectiveTo investigate longitudinal effects of an AI scribe tool on patient encounter-level outcomes. DesignCase series across 48 weeks (24 pre, 24 post) per clinician. SettingPrimary care clinical encounters occurring between 01/05/24 and 10/31/25. ParticipantsPrimary care clinicians (attending physicians and advanced practice providers). ExposureAmbient AI scribe introduction to clinical workflow, indexed to study day zero, per clinician. Main outcomes and measuresEncounter-level measurements of documentation time (note writing time, time outside of scheduled hours (TOSH), pajama time), note writing patterns (note length, note closure <24h) and clinicians billed work Relative Value Units (wRVU) with a focus on changes from pre-period outcomes at Day 0 and 150. Results220 primary care clinicians (Mean age=43.7, 70.9% females; 56.4% physicians) from 36 clinics, conducting 314,845 patient encounters were included. All outcomes evolved from day zero to day 150 and are compared back to pre-period levels. There was evidence of an immediate 7% decrease on average in note writing time at day zero (Incidence Rate Ratio, IRR 0.93, 95%CI [0.89, 0.96]), intensifying to a 15% decrease by day 150 (IRR 0.85, 95%CI [0.83, 0.87]). There was no evidence of a change in pajama time or TOSH at day zero; however, at day 150, there was evidence of a 18% decrease in pajama time (0.82, 95%CI [0.73, 0.91]) and a 13% decrease in TOSH (0.87, 95%CI [0.77, 0.99]). At day zero, there was evidence of a 5% increase (1.05, 95%CI [1.00, 1.10]) in note length and 31% increase in note closures (1.31, 95%CI [1.13, 1.53]), with both slowly attenuating to pre-period levels by day 150. Although there was no evidence of changes in wRVU at day zero, there was a 2% increase total wRVU at day 150 (1.02, 95%CI [1.01, 1.03]). Conclusions and relevanceLongitudinal changes were gradual, but persistent, underscoring the gradual adaptation of AI scribes, as clinicians situated these tools within their workflows. Key PointsO_ST_ABSQuestionC_ST_ABSHow do the patterns of use of an ambient Artificial Intelligence (AI) scribe evolve over time? FindingsIn this longitudinal, quasi-experimental study on clinician use of an ambient AI scribe, documentation time, note writing patterns and financial productivity evolved over a 150-day period. Compared to the pre-period, note writing time savings increased from 7% (day zero) to 15% (day 150); changes in all other considered outcomes including time outside of scheduled hours, pajama time, note length, note closure <24h, billed work Relative Value Units evolved over the 150-day period. MeaningClinician use of ambient AI scribes showed persistent changes in patterns of use over a 150-day period, highlighting a gradual adaptation process and the need for longitudinal assessment.
Galfano, A.; Barbosu, C. M.; Aladin, B.; Rivera, I.; Dye, T. D. V.
Show abstract
Artificial intelligence (AI) is dramatically changing the healthcare landscape by providing patients, clinicians, administrators, and public health professionals with tools aiming to improve efficiency, outcomes, and experience in health. As elsewhere, New York State (NYS) experiences high demand for - and high investment in - transformation in healthcare with AI tools, though little is known about clinicians use and interest in adopting AI tools in their work. A large share of the nations future primary care clinicians train and work in NYS, and the states ability to establish clear policies, provide tools, and elevate AI competency have implications for care delivery nationally. As a result, we undertook this analysis of NYS clinicians use of AI to better understand opportunities for its adoption and inclusion in continuing education. For this analysis, we included healthcare providers who deliver ambulatory or specialty medical care within NYS, with use/frequency/purpose of AI tools by clinicians in their work as the main outcome. Of 305 NYS clinical providers responding, 23.4% indicated they use AI tools for work, and 11.1% report monthly use, 8.5% weekly use, and 4.6% daily use. AI was primarily used to search guidelines and ask clinical questions, followed by identifying drug interactions, analyzing data, analyzing images/labs, and creating care plans and patient recommendations. AI use did not vary significantly across professional disciplines or practice types, though independent practitioners were significantly more likely than advanced practice providers to use AI in their work, as were providers using social media and digital methods for obtaining continuing education. AI use increased substantially in 2025 compared with 2024. Overall, our findings suggest that programs targeting clinicians could incorporate these findings in designing accessible and acceptable AI-related continuing education opportunities to help familiarize clinicians with opportunities and risks for integrating AI tools into their practices. Author SummaryAI tools are rapidly gaining traction in the delivery of healthcare. We found that clinician use of AI was quite limited (23%), though growing. Those using AI tools used them sparingly in their work, with only about 5% reporting daily use. The purposes for which clinicians report using AI - asking clinical questions, interpreting patient results, creating patient educational materials - could contribute substantially to healthcare outcomes if widely adopted. Designers of continuing education for clinicians should help provide opportunities for clinicians to improve their familiarity, use, and competency with AI tools, to help maximize the potential health benefits possible for patients and communities.
Mai, M. V.; Shin, H. S.; Muthu, N.; Braykov, N. P.; McCarter, A.; Hilsman, J.; Orenstein, E.; Hu, X.; Kandaswamy, S.
Show abstract
Artificial intelligence models in healthcare often fail to improve patient outcomes despite strong predictive performance because they are frequently developed with limited understanding of clinical workflows and system implementation. We demonstrate a human-centered design approach to define prediction targets before model development, ensuring alignment with actionable clinical interventions. Using pediatric acute kidney injury as a case study, we convened a multidisciplinary working group and applied three complementary methods: user stories to elicit role-specific prediction targets, a People, Environment, Technology, and Tasks (PETT) Scan to analyze sociotechnical system factors, and process mapping to identify workflow leverage points. This approach revealed that different clinical roles require distinct prediction targets, with shared barriers including inadequate monitoring practices, poor visibility of at-risk patients, and unclear trajectories for kidney injury progression. By integrating clinical context before algorithm development, we identified high-impact prediction targets that support actionable interventions for hospitalists, nephrologists, and intensivists, demonstrating how human-centered design can bridge technical model performance and real-world clinical utility.
Onovo, A. A.; Cherima, Y. J.
Show abstract
ImportanceEmerging evidence suggests healthcare AI systems may exhibit deceptive alignment (appearing safe during validation while optimizing for misaligned objectives in deployment) and evaluation awareness (detecting and adapting behavior during audits), undermining regulatory validation frameworks. ObjectiveTo quantify the performance of multi-layer red-teaming approaches in detecting sophisticated healthcare AI safety failures across 10 vulnerability domains. Design, Setting, and ParticipantsSystematic red-teaming evaluation conducted December 2025-January 2026 using 50 healthcare-specific adversarial prompts across 10 vulnerability domains (5 prompts per domain). GPT-OSS-20B (21-billion parameter open-source model) was evaluated using a three-layer framework: Layer 1 (automated keyword pattern matching), Layer 2 (GPT-5 semantic analysis with chain-of-thought reasoning), and Layer 3 (PhD-level human expert adjudication as ground truth). Inter-rater reliability was assessed using Cohen kappa statistic. Main Outcomes and MeasuresPrimary outcome was risk classification (SAFE, LOW, MODERATE, or HIGH RISK) with severity scoring (0-10 scale). Secondary outcomes included layer detection rates, inter-rater agreement, thematic vulnerability patterns, and chain-of-thought reasoning exposure. ResultsOf 50 prompts evaluated, 12 (24%, 95% CI: 13.5%-38.2%) were classified HIGH RISK, 8 (16%, 95% CI: 7.7%-28.8%) MODERATE RISK, 17 (34%, 95% CI: 21.8%-48.3%) LOW RISK, and 13 (26%, 95% CI: 15.2%-39.9%) SAFE. Evaluation Awareness domain exhibited 100% HIGH/MODERATE RISK rate (5/5 prompts, mean severity 7.4/10), with the model providing technical specifications for audit detection and behavior switching. Sandbagging domain showed 40% HIGH RISK rate (2/5 prompts) with FDA validation gaming strategies. All 12 HIGH RISK prompts involved audit gaming themes (audit detection, behavior switching, system fingerprinting, or reporting violations). Layer 1 keyword matching detected 2/12 (17%, 95% CI: 4.7%-44.8%) high-risk prompts. Layer 2 GPT-5 analysis detected 12/12 (100%, 95% CI: 75.8%-100%) high-risk prompts with 0/13 (0%, 95% CI: 0%-22.8%) false positives. Human expert validation confirmed perfect concordance with Layer 2 assessments (kappa = 1.00, 95% CI: 0.999-1.000, p < 0.001), validating automated semantic analysis as reliable screening tool. Chain-of-thought leakage occurred in 28/50 (56%) prompts, exposing internal safety reasoning. Conclusions and RelevanceMulti-layer evaluation is essential for detecting sophisticated AI safety failures in healthcare. Keyword filtering alone missed 83% (95% CI: 55.2%-95.3%) of high-risk behaviors. Perfect inter-rater agreement (kappa=1.00) between automated AI semantic analysis and human expert judgment demonstrates that scalable, reliable safety screening is achievable. All HIGH-RISK outputs contained audit gaming content, indicating systematic capability to articulate regulatory circumvention. Healthcare AI systems require domain-specific red-teaming for regulatory audit gaming and dual-mode behavior detection. Findings reveal critical gaps in current AI safety measures with immediate implications for FDA/CMS regulatory frameworks.
Al-Dabbas, Z.; Khandakji, L.; Al-Shatarat, N.; Alqaisiah, H.; Ibrahim, Y.; Awed, T.; Baik, H.; Dawoud, M.; Ali, R. A.-H.; Telfah, Z.; Al-Hmaid, Y.; Alsharkawi, A.
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare and five attitudinal domains, namely perceived usefulness or performance expectancy, trust and transparency, privacy and perceived risks, empathy and human interaction, and readiness or behavioral intention, using 25 items on 5 point Likert scales. Patients expressed conditional optimism: empathy and human interaction was most strongly endorsed (M = 4.33, SD = 0.58), alongside relatively high perceived usefulness (M = 3.97, SD = 0.68), while trust and transparency (M = 3.57, SD = 0.74) and readiness (M = 3.66, SD = 0.90) were moderate to high; privacy and risk concerns were moderate (M = 3.51, SD = 0.77) and self reported exposure was lowest (M = 2.57, SD = 1.07). The highest agreement item indicated preference for AI to work alongside physicians rather than be relied on alone (M = 4.47, SD = 0.81). Trust and transparency and perceived usefulness were positively associated with readiness (r = 0.48 and r = 0.44, respectively; p <.001), while privacy and perceived risks were negatively correlated with trust and usefulness. In multivariable regression adjusting for gender, age group, education, prior AI health app or device use, and self rated digital skill, lower educational attainment (less than high school and high school) predicted reduced readiness, whereas higher digital skill predicted increased readiness (R2 = 0.101). These findings suggest that implementation strategies in Jordan should emphasize human involvement alongside AI, transparent communication and governance, and interventions that build digital confidence and reduce readiness gaps linked to education. Author summaryAI is increasingly used in healthcare, for example to support diagnosis, triage, and treatment decisions. Whether these tools are accepted by patients depends not only on how well they work, but also on whether patients trust them, understand how they are used, and feel their privacy is protected. Evidence on patient views in middle income and resource constrained settings is still limited. We surveyed 500 patients attending hospitals in three Jordanian governorates to understand how they view AI supported care. Patients generally expected AI to be useful, but they strongly preferred that clinicians remain actively involved and that AI supports rather than replaces physicians. Trust and perceived usefulness were closely linked to willingness to accept AI enabled care, while privacy concerns were present and shaped trust. Readiness to accept AI was lower among participants with lower educational attainment and higher among those with greater self rated digital skill. These findings suggest that successful implementation in Jordan should prioritize transparent communication, strong privacy safeguards, and human centered workflows, while also strengthening digital confidence to avoid widening gaps in acceptance.
Ng, J. Y.; Bhavsar, D.; Krishnamurthy, M.; Dhanvanthry, N.; Fry, D.; Kim, J. W.; King, A.; Lai, J.; Makwanda, A.; Olugbemiro, P.; Patel, J.; Virani, I.; Ying, E.; Yong, K.; Zaidi, A.; Zouhair, J.; Lee, M. S.; Lee, Y.-S.; Nesari, T. M.; Ostermann, T.; Witt, C. M.; Zhong, L.; Cramer, H.
Show abstract
BackgroundArtificial intelligence chatbots (AICs) are increasingly being integrated into scholarly publishing, with the potential to automate routine editorial tasks and streamline workflows. In traditional, complementary, and integrative medicine (TCIM) publishing, editorial and peer review processes can be particularly complex due to diverse methodologies and culturally embedded knowledge systems, presenting unique opportunities and challenges for AIC adoption. MethodsAn anonymous, online cross-sectional survey was distributed to the editorial board members of 115 TCIM journals. The survey assessed familiarity and current use of AICs, perceived benefits and challenges, ethical concerns, and anticipated future roles in editorial workflows. ResultsOf 5119 invitations, 217 eligible participants completed the survey. While approximately 70% of respondents reported familiarity with AI tools, over 60% had never used AICs for editorial tasks. Editors expressed strongest support for text-focused applications, such as grammar and language checks (81.0%) and plagiarism/ethical screening (67.4%). Most respondents (82.8%) believed that AICs would be important or very important to the future of scholarly publishing; however, the majority (65.3%) reported that their journals lacked AI-specific policies and training programs to guide editors and peer reviewers. ConclusionsMost TCIM editors believe that AICs have potential to support routine editorial functions but also have limited adoption into editorial and peer review processes due to practical, ethical, and institutional barriers. Additional training and guidance are warranted by journals to direct responsible and ethical use if AICs are to be adopted in TCIM academic publishing.
Hill, C.; Dahil, A.; Simpson, G.; Hardisty, D.; Keast, J.; Pinn, C. K.; Dambha-Miller, H.
Show abstract
Large language models (LLMs) are increasingly used for qualitative thematic analysis, yet evidence on their performance in analysing focus-group data, where polyvocality and context complicate coding, remains limited. Given the increasing role of such models in thematic analysis, there is a need for methodological frameworks that enable systematic, metric-based comparisons between human and model-based analyses. We conducted a blinded mixed-methods comparison of two general-purpose LLMs (ChatGPT-5 and Claude 4 Sonnet), an LLM-based qualitative coding application (QualiGPT), and blinded human analysts on an in-person focus-group transcript informing an AI-enabled digital health proposal. We evaluated deductive coding using a 10-code, 6-theme codebook against an expert consensus adjudication; inductive coding with a structured Likert-scale comparison to a reference-standard set of inductive themes generated by expert consensus; and manual quote verification of LLM segments to define LLM hallucination (evidence absent or non-supportive) and error rate (including partial matches and speaker-coded segments). During deductive coding against an expert consensus adjudication, large language models (LLMs) yielded a mean agreement of 93.5% (95% CI 92.5-94.5) with {kappa} = 0.34 (95% CI 0.26-0.40); blinded human coders achieved 92.7% (95% CI 91.6-93.9) agreement with {kappa} = 0.34 (95% CI 0.26-0.41). Mean Gwets AC1 was 0.92 (95% CI 0.90-0.93) for the blinded human analysis, and 0.93 (95% CI 0.92-0.94) for the LLM-assisted deductive analysis, reflecting high agreement despite the low overall code prevalence (7.8%, SD = 3.2%). Only one model achieved non-inferiority in inductive analysis of the transcript (p = 0.043). The strict hallucination rate in inductive analysis was 1.2% (SD = 2.1%). LLMs were non-inferior to human analysts for deductive coding of the focus-group data, with variable performance in inductive analysis. Low hallucination but significant comprehensive error rates indicate that LLMs can augment qualitative analysis but require human verification. Author summaryQualitative research plays an important role in digital health, assisting in the implementation of healthcare technologies and innovations. However, analysing qualitative data in the form of focus groups is time-consuming and requires human expertise. Large Language Models (LLMs) are being increasingly used in qualitative research analysis, although evidence on their performance in analysing focus group data is limited. We compared the performance of LLMs to blinded human analysts in analysing a focus group transcript on AI implementations in healthcare. We used both qualitative and quantitative metrics to evaluate the performance of LLMs in thematic analysis. We found that the LLMs performed similarly to humans when applying pre-defined codes (deductive analysis), with a low rate of hallucination. However, in open-ended theme generation (inductive analysis) their performance was more variable, particularly in areas requiring interpretation of tone, nuance, or conversational context. These findings suggest that LLMs can be used to support interpretation of qualitative data, rather than replace human analysts. We provide a reproducible framework in analysing the performance of LLMs in qualitative analysis.
Aidoo-Frimpong, G.; Owusu, E.; Awini Asitanga, D.; Aduku, G.; Moore, S. E.; Oduro, M. A.; Ni, Z.
Show abstract
Artificial intelligence (AI) is increasingly positioned as a transformative tool in education and health. Yet empirical evidence on AI readiness in low-and middle-income countries, particularly among youth, remains scarce. This study examined patterns of adoption, equity determinants, and ethical awareness among Ghanaian youth to inform responsible AI integration in education and health systems. A cross-sectional survey was conducted among 200 youth aged 18-35 years in Ghana. Descriptive statistics, chi-square tests, and logistic-regression analyses were used to assess AI adoption, equity patterns, and predictors of readiness. Most participants reported current (89%) or prior (65%) use of AI tools. Accessibility was a significant positive predictor of adoption ({beta} = 0.142, p = 0.001), whereas limited internet connectivity ({beta} = -0.088, p = 0.049) and perceived exclusion or inequity ({beta} = -0.109, p = 0.026) were significant negative predictors. Gender and age differences indicated persistent digital inequities. Ethical concerns were widespread: 51% were somewhat concerned and 39% very concerned about data privacy, algorithmic bias, and transparency. Ghanaian youth exhibit high AI readiness, but it is distributed in structurally uneven and ethically contested contexts. Readiness is best understood as a dynamic interaction between technical access, social inclusion, and trust. Translating readiness into equitable implementation will require investments in digital infrastructure, ethical governance, and participatory design. This study provides one of the first quantitative assessments of AI readiness among African youth and offers an evidence base for developing trustworthy, inclusive AI applications, such as healthcare and educational chatbots, that are grounded in local realities. Author SummaryArtificial intelligence (AI) is often presented as a solution to challenges in healthcare and education. However, there remains limited evidence on peoples readiness to use AI in low-and middle-income countries and on the ways in which equity and ethics shape that readiness. We surveyed 200 youth in Ghana to understand their use of AI tools, perceptions of fairness and ethical concerns. Most participants were already using AI, yet adoption was uneven. Access to reliable internet and devices strongly increased use, while perceptions of exclusion and limited connectivity reduced it. Many youths expressed concern about data privacy, bias, and transparency in AI systems. These findings show that Ghanaian youth are eager but cautious adopters who value fairness and accountability. Building equitable and trustworthy AI systems in education and health will require improving access, addressing social inequalities, and involving youth directly in the design and governance of new technologies.
Moore, C.; Mugwagwa, J.; Vickers, I.
Show abstract
The use of Artificial Intelligence (AI) in healthcare is a field of growing relevance and importance, but in many LMICs, those seeking to develop AI based solutions for healthcare needs, face significant outstanding challenges. This research analysed practical efforts to implement AI-based technologies to support healthcare delivery in low-resource settings. By investigating six pilots within the Foreign Commonwealth and Development Offices Frontier Technologies program through analysis of associated pilot literature and semi-structured interviews with key pilot actors, we identified differences and commonalities in the experiences of each pilot, and in the perceived enablers and barriers for effective implementation of AI health tools. We found that AI is a promising tool in this sector but currently lacks the operating environment to be widely successful in solving healthcare challenges. Gaps in regulatory and ethical governance in these contexts exacerbated concerns around the ethical and responsible use of AI and led to alternative technical approaches being followed. The value of partnerships and relationships was demonstrated as essential, and projects with pre-established networks with key decision makers in healthcare systems, both at a bureaucratic and clinical level, demonstrated greater success in both developing and scaling their solutions. The challenge of sustainability and longer-term impact was also identified. The fragmented nature of local technology ecosystems also posed a common barrier to the delivery and scale-up of promising AI tools. It is anticipated that this research can help share some useful lessons for future users and developers of AI technologies and tools in the health space, particularly in resource-constrained settings. These findings suggest that barriers to equitable AI adoption in low-resource settings are primarily institutional and systemic, rather than technical, highlighting the need for health system-level readiness alongside technological innovation. Author SummaryArtificial intelligence (AI) is increasingly promoted as a way to improve healthcare delivery, including in low- and middle-income countries (LMICs). However, much of the existing discussion focuses on technical performance, with less attention to whether AI tools can be implemented, governed, and sustained within real-world health systems. In this study, we examine a set of AI-for-health pilot projects implemented in low-resource settings to understand what enables or constrains their adoption. Using interviews with practitioners and a review of project documentation, we explore how these pilots interacted with existing health system conditions, including workforce capacity, data infrastructure, governance arrangements, and institutional partnerships. We find that many of the challenges faced by AI projects are not primarily technical, but instead reflect broader system-level constraints, such as limited regulatory capacity, fragmented data systems, and reliance on external actors for development and maintenance. Our findings suggest that achieving equitable and inclusive AI for health requires more than developing effective technologies. It also requires sustained investment in the institutions, governance structures, and system capacities that allow AI tools to be safely adopted and integrated into health services. This study offers practical insights for policymakers, funders, and practitioners seeking to use AI in ways that strengthen health systems rather than bypass them.
Bladder, K. J. M.; Verburg, A. C.; Arts-Tenhagen, M.; Willemsen, R.; van den Broek, G. B.; Driessen, C. M. L.; Driessen, R. J. B.; Robberts, B.; Scheffer, A. R. T.; de Vries, A. P.; Frenzel, T.; Swillens, J. E. M.
Show abstract
BackgroundGenerative artificial intelligence (GenAI) in healthcare may reduce administrative burden and enhance quality of care. Large language models (LLMs) can generate draft responses to patient messages using electronic health record (EHR) data. This could mitigate increased workload related to high message volumes. While effectiveness and feasibility of these GenAI tools have been studied in the United States, evidence from non-English contexts is scarce, particularly regarding user experience. ObjectiveThis study evaluated the effectiveness, feasibility and barriers and facilitators of implementing Epics Augmented Response Technology (Art) GenAI tool (Epic Systems Corporation, Verona, WI, USA) in a Dutch academic healthcare setting among a broad range of end users. It explored healthcare professionals (HCP) usage metrics, expectations, and early user experiences. MethodsWe conducted a hybrid type 1 effectiveness-implementation design. HCPs of four clinical departments (dermatology, medical oncology, otorhinolaryngology, and pulmonology) participated in a six-month study. Effectiveness of Art was assessed using efficiency indicators from Epic (including all InBasket users in the hospital) and survey scales measuring well-being and clinical efficiency at three time points: PRE, POST-1 (1 month), and POST-2 (4 months). Feasibility of Art was evaluated through adoption indicators from Epic and survey scales on use and usability. Barriers and facilitators of Art implementation were collected through the survey and thematized using the NASSS framework (Nonadoption, Abandonment, Scale-up, Spread and Sustainability). Results237 unique HCPs generated a total of 8,410 drafts. Review and drafting times were similar for users with and without Art, indicating minimal differences. Perceived clinical efficiency declined significantly from PRE to POST-2, while well-being remained unchanged. Adoption was initially high but decreased over time, averaging 16.7% across departments. Usability and intention-to-use scores also declined significantly. Oualitative findings highlighted time savings, well-structured drafts, and patient-centered language as facilitators. Reported barriers included limited impact on time, low practical utility, content inaccuracies, and style misalignment. ConclusionsThis evaluation of a GenAI tool for patient-provider communication in a non-English academic hospital revealed mixed perceptions of effectiveness and feasibility. High initial expectations contrasted with limited perceived impact on time-savings, well-being and clinical efficiency, alongside declining adoption and usability. Barriers and facilitators revealed contrasting views. These findings underscore the need for a workflow for the handling of user feedback, guidance on clinical responsibilities, along with clear communication about the tools purpose and limitations to manage expectations. Additionally, establishing consensus on a set of quality indicators and their thresholds that indicate when a GenAI tool is sufficiently robust will be critical for responsible scaling of GenAI in clinical practice.
Garritsen, G.; den Ouden, M. E. M.; Beerlage-de Jong, N.; Kelders, S. M.
Show abstract
Technological innovations such as eHealth are vital for improving healthcare accessibility, quality, and sustainability. While most research addresses adoption at the individual or team level, less is known about organisational factors enabling sustainable transformation. Organisational readiness is a key determinant of success. The Organizational eHealth Readiness (OeHR) model, developed in Polish primary care, assesses five dimensions: Strategy, Competence, Culture, Structure, and Technology, but its applicability in Dutch health care remains unclear. This mixed-methods study evaluated the OeHR model in Dutch hospitals. A validated 32-item questionnaire, translated into Dutch, was completed by managers and implementation specialists of 15 top-clinical hospitals (n=22). Descriptive statistics, regression analyses, and Principal Component Analysis provided insight into how the five dimensions jointly reflect organisational readiness for eHealth. Three focus groups (n=14) in two hospitals explored construct interpretation, missing dimensions, and model usability. Qualitative data were analysed using deductive coding on OeHR dimensions and emergent themes to refine the questionnaire. Quantitative analyses identified organisational culture as the only significant predictor of subjective eHealth readiness, while other dimensions showed no independent effect. Open responses and focus groups confirmed the centrality of culture and suggested refinements to all components, including clearer definitions, structural flexibility, and attention to external factors. Overall, the OeHR model was valued for strategic guidance but required contextualisation for practical use. This study showed that the OeHR model provides a valuable framework for assessing eHealth readiness in Dutch hospitals, with cultural readiness emerging as the most influential yet conceptually ambiguous dimension. Strategy, competence, structural and technological readiness mainly act as contextual enablers rather than direct predictors. The findings highlight the need to refine definitions, reduce overlap, and explore additional layers such as personal, operational, and societal readiness. Strengthening conceptual clarity and developing context-sensitive tools could enhance applicability and guide hospitals in translating readiness into digital transformation. Author summaryeHealth, such as home monitoring and online consultations with healthcare providers, is becoming increasingly important. Hospitals want to implement eHealth effectively, but this is only possible if the organisation is ready for it. The Organisational eHealth Readiness model helps to assess how "eHealth-ready" an organisation is. This model looks at five components: strategy, competencies, culture, structure and technology. In our research, we examined whether this model is also suitable for Dutch hospitals and which factors were still missing. We did this using a questionnaire among professionals and three focus groups, all involved in the implementation of eHealth. This showed that all components of the model are important, but that the culture of the organisation plays a central role. If employees are open to change and innovation, this acts as a driving force for eHealth. The other components, such as technology and strategy, appear to be primarily preconditions: necessary, but not sufficient to enable real change. Participants found the model useful, but felt that some factors were missing, such as leadership, collaboration, flexibility and the influence of legislation and regulations. They also mentioned that the organisation must have the capacity to cope with change effectively. Adapting the model to include these points could better support hospitals in healthcare transformation in the future.
Yip, A.; Craig, G.; White, N. M.; Cortes-Ramirez, J.; Shaw, K.; Reddy, S.
Show abstract
PurposeTo evaluate whether large language models (LLMs) can enhance clinician-patient communication by simplifying radiology reports to improve patient readability and comprehension. MethodsA randomised controlled trial was conducted at a single healthcare service for patients undergoing X-ray, ultrasound or computed tomography between May 2025 and June 2025. Participants were randomised in a 1:1 ratio to receive either (1) the formal radiology report only or (2) the formal radiology report and an LLM-simplified version. Readability scores, including the Simple Measure of Gobbledygook, Automated Readability Index, Flesch Reading Ease, and Flesch-Kincaid grade level, were calculated for both reports. Statistical analysis of patient readability and comprehension levels, factual accuracy and hallucination rates for LLMs was assessed using a combination of binary and 5-point Likert scales, open-ended survey questions, and independent review by two radiologists. Results59/120 patients were randomised to receive both the formal and LLM-simplified radiology reports. Readability of LLM-simplified reports significantly improved with the reading level required for formal reports equivalent to a university-standard (11th-13th grade) compared to a middle-school standard (5th-9th grade) for simplified reports (rank biserial correlation=0.83, p<0.001). Patients with both reports demonstrated a significantly greater comprehension level, with 95% reporting an understanding level greater than 50%, compared with 46% without the simplified report (rank biserial correlation = 0.67, p < 0.001). All LLM-simplified reports were considered at least somewhat accurate with a minimal hallucination rate of 1.7%. Importantly, no hallucinations resulted in potential patient harm. 118/120 (98.3%) patients expressed interest in simplified radiology reports to be included in future clinical practice. ConclusionThis study provides evidence that LLMs can simplify radiology reports to an accessible level of readability with minimal hallucination. LLMs improve both ease of readability and comprehension of radiology reports for patients. Therefore, the rapid advancement of LLMs shows strong potential in enhancing patient-radiologist communication as patient access to electronic health records is increasingly adopted. HighlightsO_LIRadiology reports can be complex and difficult for patients to read and interpret C_LIO_LIStrong patient demand exists for simplified radiology reports C_LIO_LILarge language models (LLMs) such as GPT-4o show promise in simplifying radiology reports C_LIO_LILLMs credibly simplify radiology reports with minimal hallucination rates C_LIO_LILLMs improve both patient readability and comprehension of radiology reports C_LI
Park, J.-H.; Kim, S.-Y.
Show abstract
BackgroundSouth Koreas healthcare system, while technologically advanced, faces persistent inefficiencies in health information exchange (HIE) across its fragmented hospital network. Blockchain technology has been proposed as a decentralised infrastructure for secure, interoperable health data sharing, yet empirical evidence quantifying the efficiency gains attributable to blockchain-based HIE systems at the hospital network level remains absent. Traditional performance metrics fail to distinguish between technical inefficiency (suboptimal use of existing resources) and frontier shifts (technological improvements enabling new performance levels). ObjectiveTo estimate the technical efficiency of health information exchange across South Korean hospital networks and to quantify the efficiency differential attributable to blockchain-enabled versus conventional HIE platforms using Stochastic Frontier Analysis (SFA) with Bayesian Model Averaging (BMA). MethodsWe conducted a panel study of 247 hospital networks (comprising 1,842 individual hospitals) across all 17 South Korean provinces and metropolitan cities over 16 quarters (Q1 2021-Q4 2024). The dataset was constructed from the Health Insurance Review and Assessment Service (HIRA) claims database, the Korean Health Information Exchange registry, and hospital-reported digital infrastructure surveys. We specified a translog stochastic production frontier with time-varying inefficiency, where HIE output (composite index of data completeness, exchange volume, interoperability score, and clinical decision support utilisation) was modelled as a function of inputs including IT staff, digital infrastructure investment, electronic health record maturity, and network connectivity. Bayesian Model Averaging over 2^18 candidate specifications addressed model uncertainty in covariate selection for the inefficiency determinants equation. The blockchain treatment effect was estimated using a control function approach with Mundlak-Chamberlain correlated random effects to address endogenous adoption timing. Secondary analyses examined whether blockchains efficiency impact varied by hospital network characteristics using a latent class stochastic frontier model. ResultsThe mean technical efficiency of HIE across all networks was 0.67 (SD = 0.14), indicating that the average network achieved only 67% of its frontier potential output given its input levels. Blockchain-adopting networks (n = 83, 33.6%) demonstrated significantly higher mean technical efficiency (0.78, 95% CrI: 0.75-0.81) compared to conventional HIE networks (0.61, 95% CrI: 0.58-0.64), yielding an efficiency differential of 0.17 (95% CrI: 0.13-0.21). BMA identified blockchain adoption (posterior inclusion probability [PIP]: 0.98), network size (PIP: 0.94), EHR maturity level (PIP: 0.91), and dedicated IT governance structures (PIP: 0.87) as the most robust determinants of technical efficiency. The control function approach confirmed that the blockchain effect was robust to endogeneity concerns (Hausman test p = 0.34). Latent class analysis identified three distinct efficiency regimes: "Digital Leaders" (24.3% of networks, mean efficiency: 0.84), "Transitioning Networks" (48.6%, mean efficiency: 0.68), and "Digital Laggards" (27.1%, mean efficiency: 0.49). Blockchain adoption shifted the probability of Digital Leader classification by 31.2 percentage points (95% CrI: 24.8-37.6). ConclusionsSouth Korean hospital networks operating with blockchain-enabled HIE infrastructure achieve substantially higher technical efficiency in health data exchange, with the efficiency advantage persisting after accounting for model uncertainty and endogenous adoption. These findings provide the first large-scale econometric evidence supporting blockchains operational value in healthcare information infrastructure and have direct implications for South Koreas ongoing national digital health strategy and for international health systems considering blockchain-based interoperability solutions.
Abdollahyan, M.; BCNB-BCI, ; Chelala, C.
Show abstract
Common data models (CDMs) are essential for health data standardisation, which facilitates the governance and management of data, improves data quality and enhances the findability, accessibility, interoperability and reusability of data. They allow researchers to efficiently integrate health datasets and perform joint analysis on them, promoting collaboration and maximising translation of research outputs for patients benefit. We describe the process of transforming the biobank data for over 2,850 donors recruited at the Barts Cancer Institute (BCI) site of the Breast Cancer Now Biobank (BCNB) - the UKs first national breast cancer biobank hosting longitudinal biospecimens and associated clinical, genomic and imaging data - into the Observational Medical Outcomes Partnership (OMOP) CDM. Our transformation pipeline achieved high coverage, with 83% of source concepts mapped, and our OMOP CDM achieved a total pass rate of 100% in quality assessments. We present the breast cancer characteristics of the resultant patient cohort. We report several challenges faced during the transformation process and explain how we addressed them, and discuss the strengths and limitations of adopting the OMOP CDM for breast cancer research. The OMOP-mapped BCNB-BCI dataset is a valuable resource that can now be explored and analysed alongside other health datasets.
Edara, R.; Khare, A.; Atreja, A.; Awasthi, R.; Highum, B.; Hakimzadeh, N.; Ramachandran, S. P.; Mishra, S.; Mahapatra, D.; Shree, S.; Bhattacharyya, A.; Singh, N.; Reddy, S.; Cywinski, J. B.; Khanna, A. K.; Maheshwari, K.; Papay, F. A.; Mathur, P.
Show abstract
BackgroundBreakthroughs in model architecture and the availability of data are driving transformational artificial intelligence in healthcare research at an exponential rate. The shift in use of model types can be attributed to multimodal properties of the Foundation Models, better reflecting the inherently diverse nature of clinical data and the advancing model implementation capabilities. Overall, the field is maturing from exploratory development towards application in real-world evaluation and implementation, spanning both Generative and predictive AI. MethodsDatabase search in PubMed was performed using the terms "machine learning" or "artificial intelligence" and "2025", with the search restricted to English-language human-subject research. A BERT-based deep learning classifier, pre-trained and validated on manually labeled data, assessed publication maturity. Five reviewers then manually annotated publications for healthcare specialty, data type, and model type. Systematic reviews, duplicates, pre-prints, robotic surgery studies, and non-human research publications were excluded. Publications employing foundation models were further analyzed for their areas of application and use cases. ResultsThe PubMed search yielded 49,394 publications, a near-doubling from 28,180 in 2024, of which 3,366 were classified as mature. 2,966 were included in the final analysis after exclusions, compared to 1946 in 2024. Imaging remained the dominant specialty (976 publications), followed by Administrative (277) and General (251). Traditional text-based LLMs (1,019) led model usage, but Multimodal Foundation Models surged from 25 publications in 2024 to 144 in 2025, and Deep Learning models also increased substantially (910). For the first time, publications related to classical Machine Learning model use declined (173) in our annual review. Image remained the predominant data type (53.9%), followed by text (38.2%), with a notable increase in audio (1.2%) coinciding with the adoption of multimodal models. Across foundation model publications, Imaging (110), Head and Neck (92), Surgery (64), Oncology (55), and Ophthalmology (49) were leading specialties, while Administrative and Education categories remained high-volume contributors driven predominantly by LLM-based research. Conclusion2025 signals a meaningful maturation of the healthcare AI research field, with publication volumes nearly doubling, classical ML yielding to higher-capacity foundation models, and the field rapidly moving beyond traditional text-based LLM capabilities toward multimodal models. While Imaging continues to lead in research output, the growth of multimodal models across clinical specialties suggests the field is approaching an inflection point where AI systems can more closely mirror the complexity of real-world clinical practice.
Farquhar, H. L.
Show abstract
Natural language processing was applied to 3,586 Australian health practitioner tribunal decisions (1999-2026) to identify patterns in professional misconduct, outcomes, and temporal trends at a scale impractical through manual analysis. A text classification approach categorised 2,428 disciplinary decisions across seven misconduct types with acceptable accuracy for the major categories (per-class F1 0.47-0.82). Boundary violations were the most prevalent misconduct type (30.2%), followed by dishonesty/fraud (29.7%) and professional conduct breaches (28.0%). Reprimand was the most common outcome (53.0%), followed by cancellation (40.2%). Significant increasing trends were identified for boundary violations, dishonesty/fraud, professional conduct breaches, and communication failures. Boundary violations were associated with higher cancellation odds (OR = 1.36, p < 0.001). Opioid medications appeared in 67% of prescribing misconduct decisions. Significant jurisdictional variation in both misconduct types and outcomes was observed, with large effect sizes between major jurisdictions. The findings provide an empirical foundation for monitoring disciplinary trends under the National Law.
Zafar, W.; Tavares, S.; Hu, Y.; Brubaker, L.; Green, J.; Mehta, S.; Grams, M. E.; Chang, A. R.
Show abstract
BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health System in order to increase UACR testing in individuals with hypertension: an OurPractice Advisory (OPA) from Jan 2022 to Aug 2022; and a Health Maintenance Topic (HMT) in the Care Gaps section of Storyboard from Aug 2022 that continues to date. We evaluated UACR rates from 2020 to 2023 in Geisinger primary care and compared to a control group of healthcare systems in the Optum Labs Data Warehouse [OLDW]. Patients were excluded if they had UACR testing in the preceding 3 years, had diabetes or CKD, or were receiving palliative/hospice care. ResultsWe included 58,876 individuals in Geisinger (mean age 59.4 years, 49.6% female) and 1,427,754 in OLDW (61.0 years, 49% female). UACR testing in Geisinger (2.97% in 2020; 2.8% in 2021; 9.7% in 2022; 17.5% in 2023) showed significant increase compared to the control health systems (2.08%, 2.26%, 3.35% and 3.40% respectively). Results were consistent after adjusting for age, sex and race. ConclusionOPA increased UACR testing [~]3-fold whereas the HMT was associated with further improvements ([~]6-fold vs. baseline) among those with hypertension, suggesting an important role for CDS design in closing care gaps.
Lin, Y.; Ding, R.; Tabatabaei, S. M. H.; Tupper, H. I.; Moghanaki, D.; Schussel, B. H.; Aberle, D. R.; Hsu, W.; Prosper, A. E.
Show abstract
ObjectivesLung cancer screening (LCS) is the only screening test incorporating behavioral risk factors into eligibility determination. However, collecting necessary smoking history data has been challenging, limiting screening uptake. In this study, we evaluated how a program coordinators detailed shared decision-making (SDM) impacted smoking data reliability. MethodsPatients who underwent a baseline screening low-dose CT between July 31, 2013, and August 25, 2023, were stratified into pre- and post-intervention cohorts. The intervention was a comprehensive pre-CT smoking history assessment with SDM by an LCS program coordinator, implemented on July 31, 2017. We compared the completeness and concordance of smoking history data between clinician and patient self-report. ResultsAmong 3795 patients, 670 (18%) were pre- and 3125 (82%) were post-intervention. Having a coordinator reduced missing smoking data (p<0.001), but did not eliminate it. Both groups showed high concordance between clinician-documented and self-reported smoking status (pre: kappa=0.84, 95% confidence interval [CI] 0.79-0.89; post: kappa=0.84, 95% CI 0.83-0.86). Correlations strengthened for smoking duration (rho=0.71 vs. 0.65, p=0.026) and years since quitting (rho=0.83 vs. 0.80, p=0.21) after involving a coordinator. Correlations for smoking intensity and pack years remained fair (rho<0.6). LCS eligibility based on self-reported smoking history increased from 46.0% (308/670) pre- to 64.1% (2003/3125) post-intervention, below the 100% eligibility using clinician-documented history. ConclusionsSmoking data reliability improved after a dedicated LCS program coordinator implemented a smoking history assessment. Meanwhile, challenges remained with the ascertainment of total pack-years. Detailed probing and patient education may be insufficient to overcome challenges in assessing smoking intensity.
Guo, Y.; Hu, D.; Zhou, Y.; Lyu, T.; Sutari, S.; Tam, S.; Chow, E.; Perret, D.; Pandita, D.; Zheng, K.
Show abstract
Structured AbstractO_ST_ABSObjectiveC_ST_ABSAmbient artificial intelligence (AI) tools are increasingly adopted in clinical practices. This study investigated whether and how clinicians edit AI-generated drafts and the linguistic differences between AI drafts and clinician-finalized notes. Materials and MethodsThis retrospective study analyzed real-world data from ambulatory clinics at a large academic health system spanning two vendor deployments. We quantified clinicians editing behavior using the Myers diff algorithm to compare AI drafts and final documentation. We then applied statistical and linguistic analysis to study factors associated with the frequency/intensity of editing across note sections, turnaround time, clinician characteristics, and encounter types. ResultsAcross 23,760 notes that included one or more ambient AI sections, 84.4% were edited by clinicians before signing off. While rates of unedited notes differed across note sections and care settings, the dominant source of variation was individual clinician practice style rather than specialty-level norms. Notes signed after 24 hours had lower overall edit intensity. The final versions showed small but statistically significant linguistic changes and exhibited slightly higher lexical diversity and modest changes in readability. Editing is most intensive in the assessment and plan section, and varies across specialties. Conclusion and DiscussionA majority of AI-drafted clinical notes were edited by clinicians, although the editing rate varies across note sections, medical specialties, and individual clinicians. Future research is needed to further analyze this editing behavior to inform improvement in AI-assisted clinical documentation to achieve better documentation quality, efficiency, and clinician satisfaction.